3  Characterizing the water quality of the Piscataquog river & headwater streams

Let’s start by making sure we have loaded our libraries.

# read libraries
library(readr)
library(janitor)
library(lubridate)
library(tidyr)
library(dplyr)
library(ggplot2)
library(patchwork)
library(knitr)

3.1 EPA Ecosystem health assessments of streams and rivers.

From 2000 - 2004 the EPA conducted the Wadeable streams Assessment

The Wadeable Streams Assessment (WSA) is a first-ever statistically-valid survey of the biological condition of small streams throughout the U.S. EPA worked with the states to conduct the assessment in 2000-2004. 1,392 sites were selected at random to represent the condition of all streams in regions that share similar ecological characteristics. Participants used the same standardized methods at all sites, to ensure results that are comparable across the nation.

You can view the full report here. It goes into a lot of detail but especially in the early sections you can see how they divided the country into different ecoregion and some of the key characteristics that distinguish them.

From 2018 - 2019 the EPA conducted the third national streams and river assessment. This had a much broader scope in terms of the types of water bodies included.

As we know getting data from state and federal agencies can be tricky. The EPA has one database that contains the data from the various national aquatic surveys here.

3.2 Data from the wadeable streams assessment

Let’s load the data from the wadeable streams assessment and determine how it is organized.

# read in data set
wadeable <- read_delim("data/epa_water-chemistry_wadeable-streams.txt",
                       delim = "\t") %>%
  clean_names()

# look at first few rows
head(wadeable) %>%
  kable()
eparegion site_name state county ntl ptl mmi_wsabest x8 x9
REGION__1 AEGAN BROOK MAINE AROOSTOOK 150 1 67.68773 NA NA
REGION__1 PETITE BROOK MAINE AROOSTOOK 245 6 69.77790 NA NA
REGION__1 HALFWAY BROOK MAINE AROOSTOOK 351 4 67.68792 NA NA
REGION__1 DEPOT STREAM MAINE AROOSTOOK 401 5 86.01354 NA NA
REGION__1 SNARE BROOK MAINE PISCATAQUIS 268 6 75.41919 NA NA
REGION__1 R. DE CHUTE MAINE AROOSTOOK 439 8 42.86933 NA NA

Let’s take a look at Nitrogen and get a summary of the mean levels:

# summarize Nitrogen by epa region
nitrogen <- wadeable %>%
  group_by(eparegion) %>%
  summarize(mean_N = mean(ntl),
            median_N = median(ntl),
            sd_N = sd(ntl))

We can create a pretty table with that information:

kable(
  nitrogen,
  digits = 2
)
eparegion mean_N median_N sd_N
REGION_10 178.41 99.0 392.45
REGION__1 452.36 322.0 382.89
REGION__2 979.97 681.0 855.63
REGION__3 1093.56 566.0 1726.59
REGION__4 613.94 396.0 641.95
REGION__5 2815.30 1000.5 4733.40
REGION__6 789.49 560.0 748.81
REGION__7 6142.67 3450.0 7432.96
REGION__8 822.93 370.0 1642.83
REGION__9 755.99 174.0 3396.44

Alternatively we can use a boxplot to look at the distribution of observed values. To make it easier to see differences among regions we will use a logarithmic scale on the y-axis.

ggplot(wadeable, aes(x = eparegion, y = ntl)) +
  geom_boxplot() +
  scale_y_log10() +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Give it a try

Use your new found skills to compare the mean, meadian and standard deviation for Phosphorus. If you are feeling especially ambitious, see if you can figure out how to determine minimum and maximum values as well.

Then create boxplot showing the distribution of observed total phosphorous in wadeable streams of each EPA ecoregion.

Compare the different ecoregions and what you know about those areas, including the land cover to speculate what might cause some of the differences. Compare this to the information in the EPA factsheets that discuss possible sources as well as what different levels indicate about the ecosystem health.

We measured nitrogen and phophorous levels in our headwater streams - how do they compare to other wadeable streams in our and other regions? What do you conclude about the ecosystem health of our headwater streams and the Piscataquog River?

In addition to the distribution of observed values of nitrogen and phosphorus, we might also want to know about the relationship of Nitrogen and Phosphorus with the MMI index that is based on the macroinvertebrates observed in those wadeable streams.

Consider this

Before we look at the data, discuss with your group what you expect the relationship to look like and sketch and figure depicting your expectations.

We are going to calculate the logarithm for both Nitrogen and Phosphorus before we plot them.

nutrients <- wadeable %>%
  mutate(log_N = log10(ntl),
         log_P = log10(ptl))

Now we can compare Nitrogen and Phosphorous

# Nitrogen
ggplot(nutrients, aes(x = log_N, y = mmi_wsabest)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()

Give it a try

Use your new found skills to plot the relationship of Phosphorus with the Macroinvertebrate index.

Then describe your results and discuss them in light of your expectations and what you have learned about these metrics from the EPA fact sheets.

3.3 Environmental parameters of our headwater streams

We typically measure temperature, pH, dissolved oxygen, conductivity, TDS (total dissolved sediment) when we visit our headwater monitoring sites.

Let’s read in data from the past few years.

# read data
headwaters <- read_delim("data/environmental-param.tsv",
                         delim = "\t") %>%
  clean_names()

# check data
head(headwaters)
# A tibble: 6 × 10
  date      stream_name site_code tds_ppm temp_c s_cond conductivity_u_s   p_h
  <chr>     <chr>       <chr>       <dbl>  <dbl>  <dbl>            <dbl> <dbl>
1 5/26/2021 Avery Brook AVB          18.9   13.5   NA               21.7  6.45
2 7/16/2021 Avery Brook AVB          15.7   17     23.4             19.8  6.09
3 6/14/2023 Avery Brook AVB          NA     13     24.6             18.9  6.23
4 5/24/2024 Avery Brook AVB          15.9   13.2   22.9             17.8  6.05
5 6/12/2024 Avery Brook AVB          18.3   12.9   26.4             20.2  6.54
6 8/22/2024 Avery Brook AVB          21.4   14     29.3             23.1  6.9 
# ℹ 2 more variables: do_sat <dbl>, do_mg_l <dbl>

We most typically visit in the summer, early fall and late spring and in some years sites were visited more regularly than others. You’ll also notice that there is some missing data - this can be the result of equipment not working as expected or change in the instrumentation but for our intents in purposes we should be okay.

Let’s take a look at the distribution of TDS for each site. First let’s calculate the mean and standard deviation1:

  • 1 Because we have some locations with missing data, we will need to add an argument to the equation that explicitely tells R to ignore those missing values.

  • headwaters %>%
      group_by(stream_name) %>%
      summarize(mean_TDS = mean(tds_ppm, na.rm = TRUE),
                sd_TDS = sd(tds_ppm, na.rm = TRUE)) %>%
      arrange(desc(mean_TDS)) %>%
      kable(digits = 2)
    stream_name mean_TDS sd_TDS
    Piscataquog 70.70 0.57
    Back Mountain Brook 42.95 10.42
    Rand Brook 36.79 5.02
    Lower Brennan Brook 34.45 6.01
    Swenson Brook 26.73 4.49
    Hardwick Brook 25.88 5.18
    Lily Pad Pond Brook 25.40 2.42
    Scataquog Brook 22.22 4.39
    Brennan Falls 19.20 1.27
    Whiting Brook 18.45 5.04
    School House Brook 18.36 3.97
    Avery Brook 18.04 2.35
    Rachel Brook 17.75 1.33
    Bicknell Brook 16.50 3.98
    Brennan Brook 15.52 3.66
    French Brook 14.38 2.69

    Now let’s compare the distribution using boxplots to get a better idea of the distribution of observed values across different visits.

    In addition to the boxes, we are going to plot the individual points - this is helpful because some of our sites have a lot more observations compared to others, so it is a more honest depiction of our results. Instead of using geom_point() we will use geom_jitter() which means that points are randomly “jittered” on the x-axis so they are more easy to distinguish.

    ggplot(headwaters, aes(x = stream_name, y = tds_ppm)) +
      geom_boxplot(outlier.shape = NULL) +
      geom_jitter() +
      theme_classic() +
      theme(axis.text.x = element_text(angle = 90, hjust = 1))

    Give it a try

    Use your new found skills to calculate the mean and standard deviation for other parameters as well as their distributions.

    Describe your results and discuss them in the context of what you have learned from the EPA factsheets.

    3.4 Data from streams and rivers assessment 2019

    # read data
    rivers <- read_delim("data/epa_water-chemistry_streams-rivers_2019.csv",
                         delim = ",") %>%
      clean_names()
    
    # column names
    colnames(rivers)
      [1] "uid"                             "publication_date"               
      [3] "site_id"                         "date_col"                       
      [5] "visit_no"                        "ag_eco9"                        
      [7] "epa_reg"                         "state"                          
      [9] "chem_not_collected"              "wchl_not_collected"             
     [11] "lab"                             "matrix"                         
     [13] "sam_code"                        "chem_date_received"             
     [15] "chem_lab_sample_id"              "ammonia_n_batch_id"             
     [17] "ammonia_n_date_analyzed"         "ammonia_n_dilution_factor"      
     [19] "ammonia_n_holding_time"          "ammonia_n_mdl"                  
     [21] "ammonia_n_nars_flag"             "ammonia_n_qa_flag"              
     [23] "ammonia_n_result"                "ammonia_n_rl"                   
     [25] "ammonia_n_units"                 "anc_batch_id"                   
     [27] "anc_date_analyzed"               "anc_holding_time"               
     [29] "anc_mdl"                         "anc_nars_flag"                  
     [31] "anc_qa_flag"                     "anc_result"                     
     [33] "anc_rl"                          "anc_units"                      
     [35] "calcium_batch_id"                "calcium_date_analyzed"          
     [37] "calcium_dilution_factor"         "calcium_holding_time"           
     [39] "calcium_mdl"                     "calcium_nars_flag"              
     [41] "calcium_qa_flag"                 "calcium_result"                 
     [43] "calcium_rl"                      "calcium_units"                  
     [45] "chloride_batch_id"               "chloride_date_analyzed"         
     [47] "chloride_dilution_factor"        "chloride_holding_time"          
     [49] "chloride_mdl"                    "chloride_nars_flag"             
     [51] "chloride_qa_flag"                "chloride_result"                
     [53] "chloride_rl"                     "chloride_units"                 
     [55] "color_batch_id"                  "color_date_analyzed"            
     [57] "color_holding_time"              "color_mdl"                      
     [59] "color_nars_flag"                 "color_qa_flag"                  
     [61] "color_result"                    "color_rl"                       
     [63] "color_units"                     "cond_batch_id"                  
     [65] "cond_date_analyzed"              "cond_holding_time"              
     [67] "cond_mdl"                        "cond_nars_flag"                 
     [69] "cond_qa_flag"                    "cond_result"                    
     [71] "cond_rl"                         "cond_units"                     
     [73] "doc_batch_id"                    "doc_date_analyzed"              
     [75] "doc_dilution_factor"             "doc_holding_time"               
     [77] "doc_mdl"                         "doc_nars_flag"                  
     [79] "doc_qa_flag"                     "doc_result"                     
     [81] "doc_rl"                          "doc_units"                      
     [83] "magnesium_batch_id"              "magnesium_date_analyzed"        
     [85] "magnesium_dilution_factor"       "magnesium_holding_time"         
     [87] "magnesium_mdl"                   "magnesium_nars_flag"            
     [89] "magnesium_qa_flag"               "magnesium_result"               
     [91] "magnesium_rl"                    "magnesium_units"                
     [93] "nitrate_n_batch_id"              "nitrate_n_date_analyzed"        
     [95] "nitrate_n_dilution_factor"       "nitrate_n_holding_time"         
     [97] "nitrate_n_mdl"                   "nitrate_n_nars_flag"            
     [99] "nitrate_n_qa_flag"               "nitrate_n_result"               
    [101] "nitrate_n_rl"                    "nitrate_n_units"                
    [103] "nitrate_nitrite_n_batch_id"      "nitrate_nitrite_n_date_analyzed"
    [105] "nitrate_nitrite_n_holding_time"  "nitrate_nitrite_n_mdl"          
    [107] "nitrate_nitrite_n_nars_flag"     "nitrate_nitrite_n_qa_flag"      
    [109] "nitrate_nitrite_n_result"        "nitrate_nitrite_n_rl"           
    [111] "nitrate_nitrite_n_units"         "nitrite_n_batch_id"             
    [113] "nitrite_n_date_analyzed"         "nitrite_n_dilution_factor"      
    [115] "nitrite_n_holding_time"          "nitrite_n_mdl"                  
    [117] "nitrite_n_nars_flag"             "nitrite_n_qa_flag"              
    [119] "nitrite_n_result"                "nitrite_n_rl"                   
    [121] "nitrite_n_units"                 "ntl_batch_id"                   
    [123] "ntl_date_analyzed"               "ntl_diss_batch_id"              
    [125] "ntl_diss_date_analyzed"          "ntl_diss_holding_time"          
    [127] "ntl_diss_mdl"                    "ntl_diss_nars_flag"             
    [129] "ntl_diss_qa_flag"                "ntl_diss_result"                
    [131] "ntl_diss_units"                  "ntl_holding_time"               
    [133] "ntl_mdl"                         "ntl_nars_flag"                  
    [135] "ntl_qa_flag"                     "ntl_result"                     
    [137] "ntl_rl"                          "ntl_units"                      
    [139] "ph_batch_id"                     "ph_date_analyzed"               
    [141] "ph_holding_time"                 "ph_nars_flag"                   
    [143] "ph_qa_flag"                      "ph_result"                      
    [145] "ph_units"                        "potassium_batch_id"             
    [147] "potassium_date_analyzed"         "potassium_dilution_factor"      
    [149] "potassium_holding_time"          "potassium_mdl"                  
    [151] "potassium_nars_flag"             "potassium_qa_flag"              
    [153] "potassium_result"                "potassium_rl"                   
    [155] "potassium_units"                 "ptl_batch_id"                   
    [157] "ptl_date_analyzed"               "ptl_diss_batch_id"              
    [159] "ptl_diss_date_analyzed"          "ptl_diss_holding_time"          
    [161] "ptl_diss_mdl"                    "ptl_diss_nars_flag"             
    [163] "ptl_diss_qa_flag"                "ptl_diss_result"                
    [165] "ptl_diss_units"                  "ptl_holding_time"               
    [167] "ptl_mdl"                         "ptl_nars_flag"                  
    [169] "ptl_qa_flag"                     "ptl_result"                     
    [171] "ptl_rl"                          "ptl_units"                      
    [173] "silica_batch_id"                 "silica_date_analyzed"           
    [175] "silica_holding_time"             "silica_mdl"                     
    [177] "silica_nars_flag"                "silica_qa_flag"                 
    [179] "silica_result"                   "silica_rl"                      
    [181] "silica_units"                    "sodium_batch_id"                
    [183] "sodium_date_analyzed"            "sodium_dilution_factor"         
    [185] "sodium_holding_time"             "sodium_mdl"                     
    [187] "sodium_nars_flag"                "sodium_qa_flag"                 
    [189] "sodium_result"                   "sodium_rl"                      
    [191] "sodium_units"                    "sulfate_batch_id"               
    [193] "sulfate_date_analyzed"           "sulfate_dilution_factor"        
    [195] "sulfate_holding_time"            "sulfate_mdl"                    
    [197] "sulfate_nars_flag"               "sulfate_qa_flag"                
    [199] "sulfate_result"                  "sulfate_rl"                     
    [201] "sulfate_units"                   "tkn_batch_id"                   
    [203] "tkn_date_analyzed"               "tkn_holding_time"               
    [205] "tkn_mdl"                         "tkn_nars_flag"                  
    [207] "tkn_qa_flag"                     "tkn_result"                     
    [209] "tkn_rl"                          "tkn_units"                      
    [211] "tss_batch_id"                    "tss_date_analyzed"              
    [213] "tss_holding_time"                "tss_mdl"                        
    [215] "tss_nars_flag"                   "tss_qa_flag"                    
    [217] "tss_result"                      "tss_rl"                         
    [219] "tss_units"                       "turb_batch_id"                  
    [221] "turb_date_analyzed"              "turb_holding_time"              
    [223] "turb_mdl"                        "turb_nars_flag"                 
    [225] "turb_qa_flag"                    "turb_result"                    
    [227] "turb_rl"                         "turb_units"                     
    [229] "wchl_date_received"              "wchl_lab_sample_id"             
    [231] "chla_batch_id"                   "chla_date_analyzed"             
    [233] "chla_holding_time"               "chla_mdl"                       
    [235] "chla_nars_flag"                  "chla_result"                    
    [237] "chla_rl"                         "chla_units"                     
    [239] "chla_volume_filtered"            "pheo_batch_id"                  
    [241] "pheo_date_analyzed"              "pheo_holding_time"              
    [243] "pheo_nars_flag"                  "pheo_result"                    
    [245] "pheo_rl"                        
    Give it a try

    Take a look at the information that is contained within this assessment and discuss with your group which ones you think will be the most beneficial to compare to get context to compare our headwater streams to other streams in our ecoregion as well as other ecoregions in general.